arXiv:1503.02592vl [cs.DS] 9 Mar 2015 


MATHEMATICS OF COMPUTATION 
Volume 00, Number 0, Pages 000-000 
S 0025-5718(XX)0000-0 


TWO COMPACT INCREMENTAL PRIME SIEVES 

JONATHAN P. SORENSON 


Abstract. A prime sieve is an algorithm that finds the primes up to a bound 
n. We say that a prime sieve is incremental, if it can quickly determine if n + 1 
is prime after having found all primes up to n. We say a sieve is compact if it 
uses roughly n space or less. In this paper we present two new results: 

• We describe the rolling sieve, a practical, incremental prime sieve that 
takes 0(n log log n) time and 0(y/n log n) bits of space, and 

• We show how to modify the sieve of Atkin and Bernstein [1] to obtain a 
sieve that is simultaneously sublinear, compact, and incremental. 

The second result solves an open problem given by Paul Pritchard in 1994 [19]. 


1. Introduction and Definitions 

A prime sieve is an algorithm that finds all prime numbers up to a given bound n. 
The fastest known algorithms, including Pritchard’s wheel sieve [16] and the Atkin- 
Bernstein sieve [1], can do this using at most 0(n/log log n) arithmetic operations. 
The easy-to-code sieve of Eratosthenes requires 0(n log log n) time, and there are 
a number of sieves in the literature that require linear time [17, 18]. 

Normally, running time is the main concern in algorithm design, but in this paper 
we are also interested in two other properties: incrementality and compactness. 

We say that a sieve is compact if it uses at most n 1 ^ 2+ °^ 1 ' > space. Bays and 
Hudson [4] showed how to segment the sieve of Eratosthenes so that only 0(^/n) 
bits of space are needed. (See also [6].) Pritchard [17] showed how to apply a fixed 
wheel to get a compact sieve that runs in linear time. Atkin and Bernstein [1] gave 
the first compact sieve that runs in sublinear time. For other sieves that address 
space issues, see for example [12, 11, 13, 20, 9, 21]. 

Loosely speaking, a sieve is incremental if it can determine the primality of ?z +1 
after having found all primes up to n using a small amount of additional work. 
Bengelloun [5] presented the first prime sieve that needed only a constant amount 
of additional work to do this, thereby taking a total of 0(n ) time to find all primes 
up to n. Pritchard [19] showed how to improve Bengelloun’s sieve using a dynamic 
wheel so that in constant time it determines the the primality of all integers from n 
to n + ©(loglogn). In other words, it takes 0(1 + (p — n)/ loglogn) time to find p , 
if p is the smallest prime exceeding n. Pritchard’s sieve takes 0(n/loglog?r) time 
to find all primes up to n. 
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To clarify, let us define a sieve as t(n)-incremental if, in the worst case, it requires 
(p — n) ■ t(n) + 0(1) operations to find p = p n ( n )+i, the smallest prime exceeding n, 
after having found all primes up to n. Thus, Bengelloun’s sieve is 0(l)-incremental, 
and Pritchard’s sieve is 0(1/log log n)-incremental. 

Open Problem. Design a prime sieve that is both compact and o(l) -incremental. 

[19] 

In this paper, we address this problem from two perspectives, one practical, and 
one theoretical: 

• We present a sieve algorithm, with pseudocode, that finds the primes up to 
n using 0(nlog logic) arithmetic operations and 0(^/n\ogn) bits of space, 
that is 0(log n/ log log n)-incremental. It is based on the segmented sieve of 
Eratosthenes, adapting some of the ideas in Bennion’s sieve [11]. Our sieve 
uses a circular array of stack datastructures and works well in practice. 

• We also prove the following: 

Theorem 1.1. There exists a prime sieve that is both compact and O(l/loglogn)- 
incremental. 

Our proof relies on modifying the sieve of Atkin and Bernstein [1]. 

After we discuss some preliminaries in §2, we present our rolling sieve in §3 and 
we prove our theorem in §4. 


2. Preliminaries 

In this section we discuss our model of computation, review some helpful esti¬ 
mates from elementary number theory, and review the sieve of Eratosthenes. 

2.1. Model of Computation. Our model of computation is a RAM with a po¬ 
tentially infinite, direct access memory. 

If n is the input, then all arithmetic operations on integers of O(logic) bits have 
unit cost. This includes +, —, x, and division with remainder. Comparisons, array 
indexing, assignment, branching, bit operations, and other basic operations are also 
assigned unit cost. Memory may be addressed at the bit level or at the word level, 
where each word has 0(log n) bits. 

Space is measured in bits. Thus, it is possible for an algorithm to touch n bits 
in only 0(n/log n) time if memory is accessed at the word level. The space used 
by the output of a prime sieve, the list of primes up to n, is not counted against 
the algorithm. 

This is the same model used in [9, 20, 21]. 

2.2. Some Number Theory. We make use of the following estimates. Here the 
sums over p are over primes only, and x > 0: 


(2.1) 

A x ) : = £ 1 = 

p<x 

r^-(l + o(l)); 
logic 

(2.2) 

"e I ^ 

II 

log log x + 0(1); 

(2.3) 

£i°g^ = 

x(l + o(l)); 




For proofs, see Hardy and Wright [14], 
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2.3. Sieve of Eratosthenes. We present algorithms using a C++ style that should 
be familiar to most readers. We assume that integer variables can accurately hold 
any integer represented in standard signed binary notation in O(logn) bits. In 
practice, this means 64 bits. 

Recall that the sieve of Eratosthenes uses a bit vector to represent the primes 
up to n. All are assumed to be prime (aside from 0 and 1) and, for each prime p 
found, its multiples q are enumerated and ’’crossed off’ as not prime, as follows: 

BitVector S(n+1); // bit vector for 0..n 
S.setAll(); 

S [0] =0; S [1] =0; 
for(p=2; p*p<=n; p=p+l) 

if(S[p]==l) // so that p must be prime 

for(q=2*p; q<=n; q=q+p) 

S [q] =0; 

This requires = 0(nloglog?i) arithmetic operations (using (2.2)) and 

0{n) bits of space. 

2.4. Segmented Sieve of Eratosthenes. The main loop of a segmented sieve is 
simply a loop over each segment. Here A is the size of that segment; we choose A 
to be proportional to i/n. We assume here the primes below y/n. have already been 
found some other way (perhaps with the unsegmented sieve above). 

for(left=sqrt(n)+l; left<=n; left=left+delta) 

{ 

right=min(left+delta-l,n); 
sieve(left,right); 

> 

Each segment is then sieved by crossing off multiples of each prime p below yfn.. 

Primelist P(sqrt(right)); // list of primes <= sqrt(n) 

BitVector S(delta); // bit vector for left..right 
S.setAll(); 

for(i=0; i<P.length; i++) 

{ 

p=P [i] ; 

first=left+(p-(left 0 / 0 p))°/op; // min. first>=left st. plfirst 
for(q=first; q<=right; q=q+p) 

S[q-left]=0; 

> 

Using (2.2), the running time for a single segment is proportional to 

+ - ° ( A10S10S,! + logn) 

p<y/n 

and summing this cost over n/A segments gives 0{n log log n). The total space 
used is 0( v / n) bits; this is dominated by space for the the list of primes up to \Jn. 
(using (2.3)), plus A bits for the segment bit vector. 


Remarks. 
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• Both sieves can readily be modified to completely factor all integers up to 
n by replacing the bit vector with an array of lists, and adding p to the list 
for each q generated by the inner loop. 

• Pritclrard[17] added a static wheel to the segmented sieve to obtain an 0(n) 
running time. For a discussion of the static wheel, see [20, §2.4], which gives 
C++-style pseudocode and a running time analysis. 

• For a parallel version of the segmented sieve, see [22], 

3. Algorithm Description 

In this section we present our rolling sieve. It is in many ways similar to the 
hopping sieve of Bennion [11]. 

The primary data structure is a circular array of stacks. Each array location 
corresponds to an integer n, and when n is reached by the sieve, its stack will 
contain the prime divisors of n up to ^fn. . If the stack is empty, then either n is 
prime or the square of a prime. In the latter case, we discover a new prime by 
which to sieve, r, so we push r onto n + r’s stack. 

When moving from nton+1, we pop each prime p from n’s stack and push it 
onto the stack for n+p, the next stack position corresponding to an integer divisible 
by P- 

If the position for n + p is larger than A, the size of the circular array, we simply 
wrap around to the front. 

We are now ready to present pseudocode. 

3.1. Precomputation. Let start be the value of the first integer we wish to test 
for primality. To set up, then, we find the primes up to -y/start and push them 
onto the correct stacks. Note that array position 0 corresponds to start as we 
begin. 

r=floor(sqrt(start))+l; 
s=r*r; 

PrimeList P(r-l); 
delta=r+2; 

StackArray T(delta); 
for(i=0; i<P.length; i=i+l) 

{ 

p=P [i] ; 

j = (p-(start%p))%p; 

T[j].push(p); 

> 

pos=0; n=start; 

We have [-y/n] = r — 1 = A — 3 so that n < s = r 2 and r + 1 < A hold. 

3.2. Primality of n. Once the array of stacks is set up, we can check each succes¬ 
sive integer n for primality as shown in the function below, which returns true if n 
is prime, and false otherwise. It also sets its state to be ready to handle n + 1 on 
the subsequent call. 

bool nextQ 

{ 

isPrime=true; 

while(!T[pos].isEmptyO) // process prime divisors 
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{ 

p=T[pos].pop(); 

T [(pos+p) "/delta] .push(p) ; 
isPrime=false; 

> 

if(n==s) // then n is a square 

{ 

if(isPrime) // then r is in fact prime 

{ 

T[(pos+r) “/delta] .push(r) ; 
isPrime=false; 

} 

r=r+l; s=r*r; 

> 

n=n+l; 

pos= (pos+1) “/delta; 
if(pos==0) { delta=delta+2; } 
return isPrime; 


• The list of primes stored in stacks includes all primes up to y/n,, unless n is, 
in fact, the square of a prime. If this is the case, we detect that r is prime 
and put it into the appropriate stack. This is how we incrementally grow 
the list of primes up to ^Jn. for sieving. 

• We can grow A and the array of stacks over time by simply adding two 
to A when pos reaches zero. We make sure no prime stored in the stacks 
exceeds A, or in other words, r < A is invariant. The new stacks added to 
the end of the array are initialized to empty. 

Note that if we start A = + 3, and then after A function calls 

increment by 2, we end up iterating the mapping (n, A) —> (n + A, A + 2). 
Iterating k times gives (n, A) => (n + kA + k(k — 1), A + 2k). Squaring 
A + 2 k shows that the segment stays larger than over time, and the 
k(k — 1) term insures it won’t exceed 0(y/n). 

Extending an array of stacks in the RAM model discussed in §2 is 
straightforward, but in practice it is not so simple. One option is to use a 
fixed value for A, only use primes up to A in the stacks, and then numbers 
that seem prime should be prime tested as an extra step. See, for example, 
the ideas in [21]. 

• To find all primes up to a bound n, simply print the primes up to 100, say, 
then set up using start = 100, and repeatedly call the next() function, 
printing when it returns true. 

int nextprimeO 

{ while! InextO) ; return n-1; } 


• The sieve can readily be modified to generate integers in factored form. 
Simply make a copy of n, and as the primes are popped from the stack, 
divide them into n. When the stack is empty, what remains is either 1 or 
prime. We leave the details to the reader. 
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An early version of this algorithm was used for this purpose in generating 
data published in [3]. 

• The array of stacks datastructure used here can easily be viewed as a 
special-case application of a monotone priority queue. In this view, each 
prime up to \fri is stored in the queue with its priority set to the next 
multiple of that prime. See, for example, [7]. 


3.3. Analysis. 

Theorem 3.1. The rolling sieve will find, allprimes up to a boundn using 0(n\og\ogn) 
arithmetic operations and 0(^/n\ogn) bits of space. 


Proof. The running time is bounded by the number of times each prime p up to 
yjn. is popped and pushed. But that happens exactly once for each integer up to n 
that is divisible by p, or 


E 


p<y/n 


n 

p 


0(n log logn). 


Assuming a linked list style stack data structure, the total space used is one machine 
word for the number of stacks, A = 0(y/n), plus the number of nodes, n(y/n). At 
0(log n) bits per word, we have our result. □ 


Theorem 3.2. The rolling sieve is 0(log n/ log log n)-incremental. 

Proof. It should be clear that we need to count the total number of prime divisors 
of the integers from n to p , where p is the smallest prime larger than n. 

Let £ = p — n. First we consider primes q < t. Each such prime q divides 0(£/q) 
integers in this interval, for a total of 0{t log log ^). 

Next we consider primes q > £. Each such prime can divide at most one integer 
between n and p. Summing up the logarithms of such primes, we have 

i e log q = 0{£\ogri). 

m=n q\m,q>t 

The number of such primes q is maximized when the primes are as small as possible. 
Again, each prime can only appear once in the sum above, so we solve the following 
for x: 

log q = 0{t log n) 

£<q<x 

We have x = 0(£\ogn) by (2.3), and so the number of primes is bounded by 
tt(x) = 0(£\ogn/(log£ + log logn)). 

Dividing through by l completes the proof. □ 


Remarks. 

• For the purposes of analysis, and to limit the space used, our stacks are 
linked-list based. For speed, array-based stacks are better. For 64-bit 
integers, an array of length 16 is sufficient. 

• Although we can only prove the rolling sieve is 0(log n/ log log n)-incremental, 
by the Erdos-Kac theorem, in practice it will behave as if it is O(loglogn)- 
incremental. See, for example, [10]. 
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• One could add a wheel to reduce the overall running time by a factor 
proportional to log log n, and similarly improve incremental behavior. We 
have not tried to code this, and growing the wheel by adding even a single 
prime, in an incremental way, seems messy at best. From a theoretical 
perspective, the question is moot as we will see in the next section. 

4. The Theoretical Solution 

In this section we prove Theorem 1.1. 

Our proof makes use of the sieve of Atkin and Bernstein [1], In particular, we 
make use of the following properties of this algorithm: 

• The sieve works on segments of size A, where A ss y/n., and has a main 
loop similar to the segmented sieve of Eratosthenes discussed in §2.4. 

• The time to determine the primality of all integers in the interval from n 
to n + A is 0(A/loglog?r) arithmetic operations. 

• Each interval can be handled separately, with at most O(A) extra space of 
data passed from one interval to the next. 

Any sieve that had these properties can be used as a basis for our proof. 

Proof. We maintain two consecutive intervals of information. 

The first interval is a simple bit vector and list of primes in the interval n to 
n + A. Calls to the nextO function are answered using data from this interval in 
constant time. We can also handle a nextprimeO function that jumps ahead to 
the next prime in this interval in constant time using the list. 

The second interval, for n + A to n + 2A, is in process; it is being worked on by 
the Atkin-Bernstein sieve. We maintain information that allows us to start and stop 
the sieve computation after any fixed number of instructions. When this interval 
is finished, it creates a bit vector and list of primes ready for use as the new first 
interval. 

So, when a call to the nextO function occurs, after computing its answer from 
the first interval, a constant amount of additional work time is invested sieving the 
second interval; enough that after A calls to nextO, the second interval will be 
completed. 

When a call to the nextprimeO function occurs, we compute the distance to 
that next prime, £, and invest 0(1 Tt'/loglogn) time sieving the second interval 
after computing the function value. 

In this way, by the time the first interval has been used up, the second interval 
has been completely processed, and we can replace the first interval with the second, 
and start up the next one. □ 

Remarks. 

• The proof is a bit artificial and of theoretical interest only. Coding this in 
practice seems daunting. 

• Any segmented sieve, more or less, can be used to make this proof work. 
In particular, if a compact sieve that is faster, or uses less space, can be 
found, such a sieve immediately implies a version that is also incremental. 

• Pritchard has also pointed out that additive sieves, ones that avoid the use 
of multiplication and division operations, are desirable due to the fact that 
hardware addition is much faster than multiplication or division. From a 
theoretical perspective, this is moot since a multiplication table together 
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with finite-precision multiplication and division routines can reduce all 
arithmetic operations on 0(logn)-bit integers to additions. 

That said, see [19, 16]. 
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